Assessing the quality and reliability of YouTube videos as a source of information on inflammatory back pain

Background Inflammatory back pain is a chronic condition with localized pain, particularly in the axial spine and sacroiliac joints, that is associated with morning stiffness and improves with exercise. YouTube is the second most frequently used social media platform for accessing health information. This study sought to investigate the quality and reliability of YouTube videos on inflammatory back pain (IBP). Methods The study design was planned as cross-sectional. A search was conducted using the term “inflammatory back pain,” and the first 100 videos that met the inclusion criteria were selected on October 19, 2023. The data of the videos selected according to the inclusion and exclusion criteria in the study settings were examined. Videos with English language, with audiovisual content , had a duration >30 s, non-duplicated and primary content related to IBP were included in the study. A number of video parameters such as the number of likes, number of views, duration, and content categories were assessed. The videos were assessed for reliability using the Journal of the American Medical Association (JAMA) Benchmark criteria and the DISCERN tool. Quality was assessed using the Global Quality Score (GQS). Continuous variables were checked for normality of distribution using Shapiro–Wilk test and Kolmogorov–Smirnov test. Kruskal–Wallis test and Mann–Whitney U test were used to analyze the continuous data depending on the number of groups. Categorical data were analyzed using Pearson’s chi-square test. Results Reliability assessment based on JAMA scores showed 21% of the videos to have high reliability. Quality assessment based on GQS results showed 19% of the videos to have high quality. JAMA, DISCERN, and GQS scores differed significantly by source of video (p < 0.001, < 0.001, and = 0.002, respectively). Video duration had a moderate positive correlation with scores from the GQS (r = 0.418, p < 0.001), JAMA (r = 0.484, p < 0.001), and modified DISCERN (r = 0.418, p < 0.001). Conclusion The results of the present study showed that YouTube offers videos of low reliability and low quality on inflammatory back pain. Health authorities have a responsibility to protect public health and should take proactive steps regarding health information shared on social media platforms.

evidence (Maia et al., 2021a;Maia et al., 2021b).AlMuammar et al. (2021) found that 92.6% of the participants in their study used the Internet to seek medical information and 42% of them accessed information from Internet sources to avoid going to the hospital.The same study also reported YouTube to be the second most frequently used social media platform for accessing health information.Another study reported that 54% of patients conducted a search on their disease before doctor visits and that misinformation has a direct impact on patient decision-making and patient-physician relationship (Hornung et al., 2022).Thus, providing accurate information to patients is critical.The positive effects of YouTube on medical education or patient education about specific conditions are undeniable (Sampson et al., 2013).However, the possibility of sharing misinformation and hidden industry influence raises some concerns (Syed-Abdul et al., 2013;Freeman, 2012).As the popularity of online platforms such as YouTube continues to grow, so do concerns over the overall quality and reliability of videos uploaded to these platforms.Previous studies have reviewed the reliability and quality of YouTube videos on topics such as low BP, lumbar disc herniation, and epidural steroid injection (Chang & Park, 2021;Maia et al., 2021a;Maia et al., 2021b;Mohile et al., 2023).
The study containing LBP-related information presented on YouTube has been shown to contain information that is not evidence-based.It has been determined that there is a tendency to prioritize information about invasive methods rather than how the LBP process is Maia et al. (2021a) and Maia et al. (2021b).In the study on epidural steroid injection, it was reported that the reliability and quality of the content was found to be low, even in the videos uploaded by doctors and hospitals (Chang & Park, 2021).It was a matter of curiosity what the content, reliability and quality of YouTube videos about IBP were.However, there is limited information on YouTube content related to IBP.Conducting such a study will not only shed light on the reliability and quality of the information that patients seek about IBP on the YouTube platform, but will also direct clinicians and researchers to the issues that threaten public health by conducting studies on different conditions on social media platforms.The present study sought to assess the reliability and quality of YouTube videos on IBP and identify sources that provide more reliable information.

Study design, setting and video selection
In this cross-sectional study the data of the first 100 videos selected according to the inclusion and exclusion criteria in the study settings were examined.On October 19, 2023, a search was conducted on YouTube (https://www.youtube.com)using the term ''Inflammatory back pain''.The videos were selected by two authors (M.K. and E.O.).A neutral term was used to create a large pool of videos (Barlas et al., 2023;Ozduran & Büyükçoban, 2022).Possible inconsistency in video assessment was resolved by a third author (M.M.K.) making the final decision.The authors deleted cookies and Internet search history, signed out of their Google account, and used the Google Incognito form to search YouTube (Maia et al., 2021a;Maia et al., 2021b;Ozduran & Büyükçoban, 2022).Videos with English language, with audiovisual content, had a duration >30 s, non-duplicated and primary content related to IBP were included in the study (Barlas et al., 2023;Chang & Park, 2021).Videos were excluded if they were not in English language and had no audiovisual content (Chang & Park, 2021) or had a duration <30 s, duplicated and primary content unrelated to IBP (Barlas et al., 2023).Videos were sorted by ''view count'' based on the rationale that the most viewed videos were more likely to be accessed by users seeking information in a specific area (Chang & Park, 2021).After a detailed assessment based on the inclusion and exclusion criteria, the first 100 videos were included in the study, as indicated in the literature (Basch et al., 2021;Manchaiah et al., 2020).The present study follows the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) reporting guidelines (von Elm et al., 2007).

Content categories
The videos were thoroughly analyzed in terms of content type relating to IBP.The type of content was divided into six groups to determine whether videos included them.These content categories were: (1) etiology, (2) symptoms, (3) physical examination, (4) diagnosis, (5) differential diagnosis, and (6) treatment.

Reliability assessment
The videos were assessed for reliability based on ''The Journal of the American Medical Association (JAMA) Benchmark.''The JAMA Benchmark criteria are an objective set of guidelines used for assessing the reliability of online resources such as websites, videos, and podcasts.In this set of criteria, videos were assessed based on four categories: (1) authorship, (2) disclosure, (3) currency, and (4) attribution, where each category was assigned 1 point, yielding a final score of 0 to 4. A video with a higher JAMA score was considered to be more reliable.Thus, videos with a JAMA a score ≤2 points are considered to have low reliability and videos with ≥3 points are considered to have high reliability (Silberg, Lundberg & Musacchio, 1997) (Table 1).In assessing JAMA results, videos with a score of 0 and 1 are considered to contain insufficient data, videos with a score of 2 and 3 are considered to contain partially sufficient data, and videos with a score of 4 are considered to contain completely sufficient data (Ozduran & Büyükçoban, 2022).
The videos were also assessed for reliability using the modified DISCERN scale.The tool was created by the Public Health and Primary Care Division of Oxford University in 1999 under its original name Quality Criteria for Consumer Health Information to assess the quality of information and treatment options for health problems (Rodriguez-Rodriguez et al., 2022).The DISCERN tool consists of five criteria, where the video is assigned one point if it meets the relevant criterion and zero point if it does not.The final score ranges from 0 to 5, with higher scores representing higher reliability (Chang & Park, 2021) (Table 1).Validity and reliability evaluations were made for the JAMA and DISCERN scales (Silberg, Lundberg & Musacchio, 1997;Charnock et al., 1999).

Quality assessment
The Global Quality Score (GQS) is used to assess the quality of all resources available online.In this scoring, each criterion is worth a score of 1, and a total score of 5 indicates excellent quality.Content with a final score of 4 or 5 is considered to be of high quality, 3 indicates moderate quality, and a score of 1 and 2 indicates poor quality (Chang & Park, 2021) (Table 1).The GQS reveals the accessibility and quality of information, as well as potential usefulness for any user (Rodriguez-Rodriguez et al., 2022).Validity and reliability evaluations were made for the GQS scales (Bernard et al., 2007).

Video sources
Video sources were classified into academic institutions, health-related websites, professional organizations/societies, physicians, patients, news channels, and commercial and nonprofit organizations.The presence/absence of animation content in the videos and the country and continent of origin were also recorded.

Statistical analysis
The study data were analyzed using SPSS (Statistical Package for Social Sciences, Chicago, IL, USA) 24.0 software.Continuous data were presented using means and standard deviation (mean ±standard deviation (SD)), and categorical data were presented using percentage (%) and number (n).Continuous variables were checked for normality of distribution using Shapiro-Wilk test and Kolmogorov-Smirnov test.Kruskal-Wallis test and Mann-Whitney U test were used to analyze the continuous data depending on the number of groups.Categorical data were analyzed using Pearson's chi-square test.
Groups were compared for correlation analysis using Pearson's correlation test.Statistical significance was set at p < 0.05.

RESULTS
YouTube was searched using a search term, and the first 100 videos with the highest count of views that met the inclusion criteria were included in the study.The videos were thoroughly reviewed based on the exclusion criteria, resulting in the exclusion of 10 videos in non-English language, five irrelevant videos, one repeated video, eight videos under 30 s and 12 videos with inadequate audiovisual content (Fig. 1).Of the repeated videos, only one was included in assessment.The total duration of the included video content was 21 h, 45 min, and 50 s.The shortest and the longest video had a duration of 36 s and 1 h, 27 min, and 26 s, respectively.The least and the most viewed videos had 53 views and 11 million views, respectively.The video with the lowest and highest number of likes had zero likes and 133,000 likes, respectively.The minimum and maximum number of comments were zero and 4,720, respectively.Only 29% (n = 29) of the videos had animation content.The mean count of likes and dislikes for all the videos were 4,672.12± 18, 725.93 and 115.32 ± 450.82,respectively. Mean views,comments,duration,and VPI were 287,700.93 ± 1,253,532.21,167.29 ± 595.21,783.5 ±1,063.36,and 96.87 ± 5.22, respectively.The time period from 2020 onwards was the period with the highest number of uploads (54% of all videos).Analysis of the sources showed that the top two sources of upload were health-related websites (n = 31) and professional organizations/societies (n = 21) (Fig. 2).
As for the analysis of content categories, the two most common topics covered in the videos were symptoms (75%) and treatment (59%).Videos with diagnosis-related content were uploaded in significantly higher numbers from 2020 onwards (p = 0.037).Analysis of other content categories and animation content did not show any significant difference by year (Table 2).
In reliability and quality assessment, there is statistically significant and significant agreement between raters in the JAMA, DISCERN and GQS evaluations, respectively (κ = 0.94, p < 0.001; κ = 0.96, p < 0.001; κ = 0.94, p < 0.001).Reliability was assessed Videos identified after typing the search term (Inflammatory back pain) The first 100 videos were identified after evaluation according to the exclusion criteria.using the JAMA and modified DISCERN tools.Mean ± SD values of JAMA and modified DISCERN scores were 1.88 ± 1.04 and 37.1 ± 20.37, respectively.When JAMA scores were categorized into high and poor reliability, 21 (21%) and 79 (79%) videos were of high and poor reliability, respectively.Assessment based on the DISCERN tool showed ten (10%) videos to be excellent and nine (9%) videos to be good.Quality assessment using the GQS yielded a mean ± SD score of 2.13 ± 1.2, and only 19 (19%) of the videos were of high quality (Table 3).

Videos excluded
Reliability and quality assessment results differed significantly by video source.JAMA, DISCERN, and GQS results exhibited significant differences in connection with video source (p < 0.001, <0.001, and = 0.002, respectively).Of the videos considered completely sufficient according to JAMA results, 33.3% (n = 4) had been uploaded by academic institutions and another 33.3% (n = 4) by professional organizations/societies.Of the videos with insufficient data, 36.5% (n = 27) had been uploaded by health-related websites and 20.3% (n = 15) by commercial sources.Of the videos found to have good and excellent reliability according to the modified DISCERN score, 47.4% (n = 9) had been uploaded by professional organizations/societies and 21.1% (n = 4) by academics.As for videos considered to be of high quality according to the GQS results, 36.8% (n = 7) had been uploaded by professional organizations/societies and 21.1% (n = 4) by academics.
Of the low quality videos, on the other hand, 36.5% had been uploaded by healthrelated websites and 20.3% (n = 15) by commercial sources.According to the JAMA, modified DISCERN, and GQS scores, all videos uploaded by academics were of high quality, contained completely sufficient data, and had excellent reliability.In contrast, all commercial videos were of low quality, contained insufficient data, and had very poor reliability (Table 3).JAMA reliability scores were compared with video parameters, and only duration was associated with a significant difference (p < 0.001).Similarly, comparison of the modified DISCERN scores and video parameters showed only duration to be associated with a significant difference (p < 0.001).Videos with short duration had lower reliability scores in both the modified DISCERN and JAMA scales.Assessment of quality scores (GQS) by video parameter showed statistically significant differences in views (p = 0.022), dislikes (p = 0.044), duration (p = 0.004), and VPI (p < 0.001) (Table 4).
Comparison between the countries of origin of videos and video parameters revealed a significant difference in views (p = 0.008), likes (p = 0.013), dislikes (p = 0.011), and VPI (p = 0.041).Videos that originated from Canada had higher values for view count, likes, and dislikes than videos uploaded from other countries.Videos originating from the UK, on the other hand, had significantly higher VPI values.Assessment of the videos by continent of origin showed that videos originating from the American continent differed significantly in terms of views (p = 0.021), likes (p = 0.046), and comments (p = 0.037) compared with those from non-American continent videos (Table 6).

DISCUSSION
In the present study, YouTube videos on IBP were assessed in terms of user engagement criteria, reliability, content categories, and quality.The study focused on YouTube content on IBP, a symptom that may cause life-threatening health problems and negatively affect public health in case of delayed diagnosis.Chronic BP is a common symptom leading to deterioration of health and use of health resources in the community.Four out of five individuals are known to suffer from back pain at one point in their lives.In some, these symptoms resolve over time, while in others, they become chronic (Nieminen, Pyysalo & Kankaanpää, 2021).Axial spondyloarthritis (axSpA) group of diseases presenting with chronic IBP lack precise symptoms at the time of initial presentation, have a slow or delayed progression, lack reliable diagnostic tests, and have low prevalence in the  community, which may delay diagnosis by up to 8-10 years (Magrey et al., 2020;Sykes et al., 2015).Delayed diagnosis may lead to deterioration in patient's quality of life and increased economic burden, as well as severe disability due to untreated disease (Vangeli et al., 2015;Juanola Roura et al., 2015).The ASAS criteria have guided the diagnosis of axSpA by detecting the different signs and symptoms, including IBP, that manifest in early stages of the disease.Further, general practitioners, usually the first point of contact for patients, have good knowledge of IBP symptoms but not of disease-specific features and exhibit modest confidence in evaluating patients with IBP, which may cause delays in the diagnosis of the disease (Aljohani, Barradah & Kashkari, 2022).A number of factors including chronicization of symptoms and lack of response to treatment prompt patients to turn to popular social media platforms to seek information on their disease.YouTube, a platform with a billion views per month, has become a medium that influences individuals' decision-making about their health (Maia et al., 2021a;Maia et al., 2021b).The present study sought to investigate whether YouTube provides reliable, high-quality, and accurate information to individuals searching that platform for information on IBP.The results of the present study showed that the number of videos on IBP uploaded on YouTube from 2020 onwards has increased more than in any other time period, and diagnosis-related content has become a more common topic in videos uploaded in recent years.The study also showed that the highest proportion of the videos were uploaded by health-related websites (31%), and that the JAMA, DISCERN, and GQS scores were higher for videos uploaded by professional organizations/societies and academics but lower for videos uploaded by health-related websites and commercial sources.Videos with high JAMA and DISCERN scores had longer durations.The top two sources of upload for the videos investigated in this study were health-related websites (31%) and professional organizations/societies (21%).This result is consistent with some previous studies that found health-related websites to be the leading source of videos on various topics (Duman, 2020;Onder, Onder & Zengin, 2022).YouTube channels managed by health-related websites are often created by nonphysicians and are frequently used to upload videos on popular topics.
Analysis of content categories in the present study showed that the most frequently mentioned topic in the videos was IBP symptoms, followed by treatment-related content.This is consistent with previous studies that found symptom content to be the most common topic in videos (Tang, Olscamp & Choi SK, 2017;Ozsoy-Unubol & Alanbay-Yagci, 2021).For instance, Ozsoy-Unubol & Alanbay-Yagci (2021) investigated YouTube videos on fibromyalgia, another rheumatic disease, and emphasized that symptom-and treatmentrelated content were the most common categories mentioned in videos.This suggests that in videos on rheumatic diseases, disease symptoms and treatment-related issues attract viewers' attention the most, which prompts video uploaders to focus on these topics.Another interesting finding in our study is the increase in diagnosis-related content in videos uploaded in recent years.This can be explained by the increasing awareness about the loss of health caused by delayed diagnosis in people with IBP symptoms, resulting in diagnosis-related content being more frequently mentioned in recent videos.-Moreno et al. (2023) found high DISCERN scores in videos uploaded by professional organizations.Aglamis, Senel & Koudonas (2023) found high GQS and DISCERN scores for videos uploaded by professional organizations and academic institutions.Wu et al. (2022) reported high JAMA and DISCERN scores in videos from professional organizations and universities.Our study found similar results in that it observed high JAMA, DISCERN, and GQS scores in videos from professional organizations and academic sources.In contrast, reliability and quality scores were low in videos uploaded by commercial and health-related websites.This study found health-related websites to be the leading source of videos, and this result indicates a need for videos containing reliable and quality information to protect public health and provide accurate information for viewers.Thus, YouTube should prevent misinformation by establishing rigorous control mechanisms to filter out videos of low quality and poor reliability.

Lombo
We found video duration to have a positive moderate correlation with the JAMA, DISCERN, and GQS scores.The literature emphasizes that videos with high reliability and quality scores have longer durations (Kyarunts et al., 2022).Viewers seeking accurate and quality information should thus be skeptical of short videos.The present study found, as in previous studies, that the JAMA, DISCERN, and GQS scores had a moderate correlation with one another (Bolac, Ozturk & Yildiz, 2022).This can be explained by the fact that quality videos are reliable and reliable videos offer quality information.
Breakdown of the countries and continents of origin in the present study showed that most of the videos were uploaded by YouTube channels originating from the USA (64%) and the American continent (69%).Similarly, previous studies reported that most of the videos originated from the USA (Li, Giuliani & Ingledew, 2021).Results also showed that videos originating from the American continent had a significant association with the view count, likes, and comments.This can be explained by the fact that YouTube, an organization originating in the USA, is actively used by users from the USA to provide information to the whole world, and this translates into engagement in terms of certain video parameters.Thus, YouTube seems to assume an important function by conveying health information and has a direct impact on public health.

Limitations
This study has some limitations: the exclusion of videos in non-English languages, the possibility of the cross-sectional study to yield different results in another time period, and the part of subjective assessment by the authors despite the use of objective questionnaires.Although the search in this study was conducted using the Google Incognito form, YouTube's unique video ranking feature can be considered as another limitation.

Strengths of this study
Although there are studies on mechanical back pain on YouTube, there are no YouTube studies on inflammatory back pain.Therefore, it can be stated that our study will be a guide for patients who are looking for content on YouTube regarding inflammatory back pain.In addition, studies on social media are of great importance in terms of protecting public health.It can be stated that with developing technology, such studies will be needed more in the future to ensure that individuals have access to accurate, reliable and quality information.

CONCLUSION
Analysis of YouTube videos on IBP in the present study showed the majority of the videos were of low quality and reliability.Videos uploaded by professional organizations/societies and academics contained more reliable and higher-quality information.It is likely that that digitalization will make YouTube more prominent in the field of health, and viewers should be encouraged to exercise caution when approaching information received from that platform.Health authorities should be encouraged to work on social media practices involving sharing health information to protect public health.It is clear that new studies on different topics to be conducted on social media platforms the future will help raise patient awareness and support public health.

Table 2 Comparison of the quality, content and quality of videos over the years.
JAMA, Journal of the American Medical Association benchmark criteria; GQS, Global Quality Score.Bold font indicates statistical significance.

Table 4 Video parameters according to years, and quality and reliability parameters (mean ±standard deviation).
n, Number of videos; SD, Standart Deviation; GQS, Global Quality Score; JAMA, Journal of the American Medical Association benchmark criteria; VPI, Video Power Index.Bold font indicates statistical significance (p < 0.05).

Kara et al. (2024), PeerJ, DOI 10.7717/peerj.17215 12/20 Table 6 Video parameters by continent and country.
Mann Whitney U test in analysis of continents, Kruskal Wallis Test test in analysis of countries.USA, United States of America; VPI, Video Power Index; SD, Standart Deviation; GQS, Global Quality Score; JAMA, Journal of the American Medical Association benchmark criteria.Bold font indicates statistical significance.